Introduction

Since our dataset from the previous task contains only 4 usable columns, it is not suitable for a factor analysis task. Therefore we will use the data set geopol for this task instead. It contains some economical, demographical and health information about 41 countries across the world.

Factor analysis

Let us perform a factor analysis of the dataset now. After brief exploration, we will use 4 factors for our analysis.

print(fa4, digits=2, cutoff=.5, sort=TRUE)
## 
## Call:
## factanal(x = geopol[, c(1:length(geopol))], factors = 4, rotation = "varimax")
## 
## Uniquenesses:
## popu giph ripo rupo rlpo rspo eltp rnnr nunh nuth 
## 0.81 0.20 0.02 0.44 0.19 0.32 0.10 0.43 0.02 0.00 
## 
## Loadings:
##      Factor1 Factor2 Factor3 Factor4
## ripo -0.90                          
## rlpo -0.72                          
## eltp  0.78                          
## rnnr  0.65                          
## giph          0.64                  
## nunh          0.91                  
## rupo                  0.55          
## rspo                  0.55          
## nuth                          0.71  
## popu                                
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings       2.97    1.99    1.31    1.20
## Proportion Var    0.30    0.20    0.13    0.12
## Cumulative Var    0.30    0.50    0.63    0.75
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 13.62 on 11 degrees of freedom.
## The p-value is 0.255

We might see that the 4 factors explains about 75% of the data variability, which is quite satisfying. Also the p-value of a formal statistical test whether 4 factors are sufficient is around 0.25, however the data are far from normal, therefore the result must be taken with high uncertainty.

Let’s continue with a comment about the loadings.

Loadings to the particular factors might be seen on the figure below. Red color denotes negative correlation, blue color denotes positive correlation.

Finally, we will compare the results with the PCA. Let’s look at the loadings for the first three components, which together explain about 80% of the variability.

We might see that the PCA also groups ripo, rlpo and eltp together. It does not group rnnr with them that well. Pairs Giph, nunh and rupo, rspo are also grouped very well. Nuth does seem to form it’s own group even in the PCA.